Pairwise Probabilistic Clustering Using Evidence Accumulation
نویسندگان
چکیده
•O = {1, . . . , n} is the set of data object to cluster. •K is the number of desired classes. • E = {cli}i=1 is the esemble of N clusterings of O. The ensemble is obtained by running different algorithms with different. parametrizations on (possibly) sub-sampled versions of the original dataset. • each clustering is a function cli : Oi → {1, . . . , Ki} from the set of objects Oi ⊆ O to a class label. • Ωij = {p = 1 . . . N : i, j ∈ Op} is the set of indices of clusterings where i and j have been classified. •Nij = |Ωij|.
منابع مشابه
Evidence Accumulation Clustering using Pairwise Constraints
Recent work on constrained data clustering have shown that the incorporation of pairwise constraints, such as must-link and cannot-link constraints, increases the accuracy of single run data clustering methods. It was also shown that the quality of a consensus partition, resulting from the combination of multiple data partitions, is usually superior than the quality of the partitions produced b...
متن کاملCombining Data Clusterings with Instance Level Constraints
Recent work has focused the incorporation of a priori knowledge into the data clustering process, in the form of pairwise constraints, aiming to improve clustering quality and find appropriate clustering solutions to specific tasks or interests. In this work, we integrate must-link and cannot-link constraints into the cluster ensemble framework. Two algorithms for combining multiple data partit...
متن کاملPairwise clustering based on the mutual-information criterion
Pairwise clustering methods partition a dataset using pairwise similarity between data-points. The pairwise similarity matrix can be used to define a Markov random walk on the data points. This view forms a probabilistic interpretation of spectral clustering methods. We utilize this probabilistic model to define a novel clustering cost function that is based on maximizing the mutual information...
متن کاملInformation Theoretic Pairwise Clustering
In this paper we develop an information-theoretic approach for pairwise clustering. The Laplacian of the pairwise similarity matrix can be used to define a Markov random walk on the data points. This view forms a probabilistic interpretation of spectral clustering methods. We utilize this probabilistic model to define a novel clustering cost function that is based on maximizing the mutual infor...
متن کاملWeighted Evidence Accumulation Clustering Using Subsampling
We introduce an approach based on evidence accumulation (EAC) for combining partitions in a clustering ensemble. EAC uses a voting mechanism to produce a co-association matrix based on the pairwise associations obtained from N partitions and where each partition has equal weight in the combination process. By applying a clustering algorithm to this co-association matrix we obtain the final data...
متن کامل